factor 0
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Ontario > Toronto (0.04)
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks
Ganguly, Debargha, Singh, Vikash, Sankar, Sreehari, Zhang, Biyao, Zhang, Xuecen, Iyengar, Srinivasan, Han, Xiaotian, Sharma, Amit, Kalyanaraman, Shivkumar, Chaudhary, Vipin
Large language models (LLMs) show remarkable promise for democratizing automated reasoning by generating formal specifications. However, a fundamental tension exists: LLMs are probabilistic, while formal verification demands deterministic guarantees. This paper addresses this epistemological gap by comprehensively investigating failure modes and uncertainty quantification (UQ) in LLM-generated formal artifacts. Our systematic evaluation of five frontier LLMs reveals Satisfiability Modulo Theories (SMT) based autoformalization's domain-specific impact on accuracy (from +34.8% on logical tasks to -44.5% on factual ones), with known UQ techniques like the entropy of token probabilities failing to identify these errors. We introduce a probabilistic context-free grammar (PCFG) framework to model LLM outputs, yielding a refined uncertainty taxonomy. We find uncertainty signals are task-dependent (e.g., grammar entropy for logic, AUROC>0.93). Finally, a lightweight fusion of these signals enables selective verification, drastically reducing errors (14-100%) with minimal abstention, transforming LLM-driven formalization into a reliable engineering discipline.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Forecasting inflation using disaggregates and machine learning
Boaretto, Gilberto, Medeiros, Marcelo C.
This paper examines the effectiveness of several forecasting methods for predicting inflation, focusing on aggregating disaggregated forecasts - also known in the literature as the bottom-up approach. Taking the Brazilian case as an application, we consider different disaggregation levels for inflation and employ a range of traditional time series techniques as well as linear and nonlinear machine learning (ML) models to deal with a larger number of predictors. For many forecast horizons, the aggregation of disaggregated forecasts performs just as well survey-based expectations and models that generate forecasts using the aggregate directly. Overall, ML methods outperform traditional time series models in predictive accuracy, with outstanding performance in forecasting disaggregates. Our results reinforce the benefits of using models in a data-rich environment for inflation forecasting, including aggregating disaggregated forecasts from ML techniques, mainly during volatile periods. Starting from the COVID-19 pandemic, the random forest model based on both aggregate and disaggregated inflation achieves remarkable predictive performance at intermediate and longer horizons.
- South America > Brazil > São Paulo (0.04)
- North America > United States > Illinois (0.04)
- North America > Mexico (0.04)
- (3 more...)
- Government (1.00)
- Banking & Finance > Economy (1.00)
- Health & Medicine (0.89)
- Consumer Products & Services (0.67)
Cohort Shapley value for algorithmic fairness
Mase, Masayoshi, Owen, Art B., Seiler, Benjamin B.
Cohort Shapley value is a model-free method of variable importance grounded in game theory that does not use any unobserved and potentially impossible feature combinations. We use it to evaluate algorithmic fairness, using the well known COMPAS recidivism data as our example. This approach allows one to identify for each individual in a data set the extent to which they were adversely or beneficially affected by their value of a protected attribute such as their race. The method can do this even if race was not one of the original predictors and even if it does not have access to a proprietary algorithm that has made the predictions. The grounding in game theory lets us define aggregate variable importance for a data set consistently with its per subject definitions. We can investigate variable importance for multiple quantities of interest in the fairness literature including false positive predictions.
- North America > United States > Florida > Broward County (0.04)
- North America > United States > Tennessee (0.04)
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Research Report (1.00)
- Overview (0.93)
- Law (1.00)
- Leisure & Entertainment (0.68)
Predicting Gross Movie Revenue
'There is no terror in the bang, only is the anticipation of it' - Alfred Hitchcock. Yet there is everything in correctly anticipating the bang a movie would make in the box-office. Movies make a high profile, billion dollar industry and prediction of movie revenue can be very lucrative. Predicted revenues can be used for planning both the production and distribution stages. For example, projected gross revenue can be used to plan the remuneration of the actors and crew members as well as other parts of the budget [1]. Success or failure of a movie can depend on many factors: star-power, release date, budget, MPAA (Motion Picture Association of America) rating, plot and the highly unpredictable human reactions. The enormity of the number of exogenous variables makes manual revenue prediction process extremely difficult. However, in the era of computer and data sciences, volumes of data can be efficiently processed and modelled. Hence the tough job of predicting gross revenue of a movie can be simplified with the help of modern computing power and the historical data available as movie databases [2].
- Media > Film (1.00)
- Leisure & Entertainment (1.00)